High productivity multi-device exploitation with the Heterogeneous Programming Library

نویسندگان

  • Moisés Viñas
  • Basilio B. Fraguela
  • Diego Andrade
  • Ramón Doallo
چکیده

Heterogeneous devices require much more work from programmers than traditional CPUs, particularly when there are several of them, as each one has its own memory space. Multidevice applications require to distribute kernel executions and, even worse, arrays portions that must be kept coherent among the different device memories and the host memory. In addition, when devices with different characteristics participate in a computation, optimally distributing the work among them is not trivial. In this paper we extend an existing framework for the programming of accelerators called Heterogeneous Programming Library (HPL) with three kinds of improvements that facilitate these tasks. The first two ones are the ability to define subarrays and subkernels, which distribute kernels on different devices. The last one is a convenient extension of the subkernel mechanism to distribute computations among heterogeneous devices seeking the best work balance among them. This last contribution includes two analytical models that have proved to automatically provide very good work distributions. Our experiments also show the large programmability advantages of our approach and the negligible overhead incurred.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Accelerating high-order WENO schemes using two heterogeneous GPUs

A double-GPU code is developed to accelerate WENO schemes. The test problem is a compressible viscous flow. The convective terms are discretized using third- to ninth-order WENO schemes and the viscous terms are discretized by the standard fourth-order central scheme. The code written in CUDA programming language is developed by modifying a single-GPU code. The OpenMP library is used for parall...

متن کامل

A Generic Library for Stencil Computations

In this era of diverse and heterogeneous computer architectures, the programmability issues, such as productivity and portable efficiency, are crucial to software development and algorithm design. One way to approach the problem is to step away from traditional sequential programming languages and move toward domain specific programming environments to balance between expressivity and efficienc...

متن کامل

An interactive weighted fuzzy goal programming technique to solve multi-objective reliability optimization problem

This paper presents an application of interactive fuzzy goal programming to the nonlinear multi-objective reliability optimization problem considering system reliability and cost of the system as objective functions. As the decision maker always have an intention to produce highly reliable system with minimum cost, therefore, we introduce the interactive method to design a high productivity sys...

متن کامل

A Dynamic Memory Allocator for heterogeneous platforms

Modern computers are built upon heterogeneous multi-core/many cores architectures (e.g. GPGPU connected to multi-core CPU). Achieving peak performance on these architectures is hard and may require a substantial programming effort. High-level programming patterns, coupled with efficient low-level runtime supports, have been proposed to relieve the programmer from worrying about low-level detail...

متن کامل

An efficient task-based approach for solving the n-body problem on multicore architectures

With the aim of exploring programming models that can improve the efficiency in high performance scientific computing software development for multicore architectures, we have implemented a task-based library with dynamic scheduling and automatic handling of data-dependencies. The library has been evaluated both from a performance and a programmer productivity perspective. We find that the appr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • J. Parallel Distrib. Comput.

دوره 101  شماره 

صفحات  -

تاریخ انتشار 2017